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Abstract 

Reversible jump MCMC (RJ-MCMC) sampling techniques, which allow to jointly tackle 
model selection and parameter estimation problems in a coherent Bayesian framework, have 
become increasingly popular in the signal processing literature since the seminal paper of An- 
drieu and Doucet (/£'£'£' rra«s. Signal Process., 47(10), 1999). Crucial to the implementation 
of any RJ-MCMC sampler is the computation of the so-called Metropolis-Hastings-Green 
(MHG) ratio, which determines the acceptance probability for the proposed moves. 

It turns out that the expression of the MHG ratio that was given in the paper of Andrieu 
and Doucet for "Birth-or-Death" moves — the simplest kind of trans-dimensional move, used 
in virtually all applications of RJ-MCMC to signal decomposition problems — was erroneous. 
Unfortunately, this mistake has been reproduced in many subsequent papers dealing with RJ- 
MCMC sampling in the signal processing literature. 

This note discusses the computation of the MHG ratio, with a focus on the case where 
the proposal kernel can be decomposed as a mixture of simpler kernels, for which the MHG 
ratio is easy to compute. We provide sufficient conditions under which the MHG ratio of 
the mixture can be deduced from the MHG ratios of the elementary kernels of which it is 
composed. As an application, we consider the case of Birth-or-Death moves, and provide a 
corrected expression for the erroneous ratio in the paper of Andrieu and Doucet. 

1 Introduction 

Model selection and parameter estimation are fundamental tasks arising in many (if not all) signal 
processing problems, when parametric models are employed. Let us consider a collection of 
models {^Ak, k G /C}, indexed by some finite or countable set /C C N, with parameter vector 
9k ^ &k I^'** under model A4k- In a Bayesian framework, model selection (or averaging) and 
parameter estimation can in principle be carried out jointly, using the posterior distribution of the 
pair (/c,0fc), 

7r{k, Ok) cx p{y\k, Ok) p{k, Ok), (1) 
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where y is the observed data and oc indicates proportionality. Note that the distribution vr is 
defined on X = {j^eK {^i ^ ^fc' which is a disjoint union of subspaces with differing dimen- 
sionality. Generic Markov Chain Monte Carlo (MCMC) methods for probability distributions 
defined on such spaces became available during the 90's, most notably Green's widely applicable 
RJ-MCMC sampler [17], making it possible to use a fully Bayesian approach for model selec- 
tion (or averaging) and parameter estimation in all sorts of applications. The reader is referred to 
[4, 9, 18, 37, 45] for a broader view on trans-dimensional sampling techniques (including alterna- 
tives to the RJ-MCMC sampler). 

Green's RJ-MCMC sampler can be seen as a generalization of the well-known Metropolis- 
Hastings sampler [19, 28], which is capable of exploring not only the fixed-dimensional param- 
eter spaces ©fc, but also the space /C of all models under consideration. This algorithm relies on 
an accept/reject mechanism, with an acceptance ratio calibrated in such a way that the invariant 
distribution of the chain is the target distribution tt. The computation of this acceptance ratio for 
trans-dimensional moves is in general a delicate issue^ , involving measure theoretic considera- 
tions. 

Andrieu and Doucet [1] pioneered the use of RJ-MCMC sampling in "signal decomposition" 
problems, by tackling joint model selection and parameter estimation for an unknown number of 
sinusoidal signals observed in white Gaussian noise. (At the same period, RJ-MCMC also be- 
came popular for image processing tasks such as segmentation and object recognition; see, e.g., 
[11, 21, 32, 33, 41].) This seminal papers was followed by many others in the signal processing 
literature [3, 5, 6, 10, 20, 25-27, 30, 31, 40, 42, 43], relying systematically on the original paper [1] 
for the computation of the acceptance ratio of "Birth-or-Death" moves — the most elementary type 
of trans-dimensional move, which either adds or removes a component from the signal decompo- 
sition. Unfortunately, the expression of the acceptance ratio for Birth-or-Death moves provided 
by [1, Equation (20)] turns out to be erroneous, as will be explained later. Worse, the exact same 
mistake has been reproduced in most of the following papers, referred to above. 

The aim of this note is to provide clear statements of some mathematical results, perhaps 
not completely new but never stated explicitly, which can be used for a clean justification of 
the acceptance ratio of Birth-or-Death moves in signal decomposition (and similar) problems. 
Section 2 recalls, very quickly, the basics of MCMC methods, with a focus on Metropolis-Hastings 
algorithms on general state spaces (also known as RJ-MCMC algorithms). Section 3 discusses the 
computation of the acceptance ratio for mixture kernels, and provides conditions under which the 
ratio of the mixture can be directly derived from the ratio of the elementary kernels of which it 
is composed. Section 4 defines Birth-or-Death moves and provides the expression of the ratio; 
several distinct but related mathematical representations — "unsorted vectors", "sorted vectors" 
and Point processes — are discussed. As an illustration. Section 5 returns to the problem considered 
in [1] and provides a corrected expression for the Birth-or-Death ratio. Section 6 concludes the 
paper. 

2 Background on MCMC methods 

This section recalls basic definitions and results for the MCMC method. The reader is referred 
to [16-18, 29, 37, 38, 48, 49] for more detailed explanations. 

'Fortunately, the simple and powerful "dimension matching" argument [17] allows to bypass this difficulty for a 
large class of proposal distributions. 
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2.1 MCMC with reversible kernels 

Let vr be a probability distribution on a measurable space (X, B), which is to be sampled from. 
MCMC sampling methods proceed by constructing a time-homogeneous Markov chain (a;„) with 
invariant distribution vr, using a transition kernel P that is reversible with respect to tt, i.e., a kernel 
that satisfies the detailed balance condition 

vr {dx) P [x, dx') = tt (da;') P {x' , dx) . (2) 

For all measurable sets A ^ B, integrating (2) on X x yl yields 

/ 7r(da;) P{x, A) = tt (A) , 
Jx 

which means that vr is an invariant distribution for the kernel P (it is also said that "P leaves vr 
invariant"). 

If the transition kernel P is vr-iiTcducible and aperiodic, then [48, Theorem 1] vr is the unique 
invariant distribution and the chain converges in total variation to vr for vr-almost all starting 
states X. If P is also Harris recurrent, then convergence occurs for all initial distributions [37, 
Theorem 6.51]. 

Remark Some of the above requirements on the chain (xn) can be relaxed. Most notably, time- 
inhomogeneous chains are used in the context of "adaptive MCMC" algorithms; see, e.g., [2, 7] 
and the references therein. It is also possible to depart from the reversibility assumption, which is 
a sufficient but not necessary condition for vr to be an invariant distribution (see, e.g., [13]), though 
the vast majority of MCMC algorithms considered in the literature are based on reversible kernels. 



2.2 Metropolis-Hastings-Green kernels 

The very popular Metropolis-Hastings-Green kernels, sometimes simply called Metropolis- 
Hastings kernels, correspond to the following two-stage sampling procedure: first, given that the 
current state of the Markov chain is a; € X, a new state a;' G X is proposed from a transition kernel 
Q {x, da;'); second, this move is accepted with probability a {x, x') and rejected otherwise — in 
which case the new state is equal to x. More formally, for all a; G X and B G B, the transition 
kernel is given by 

P{x,B) = / g (a;, da;') a (a;, a;') +s{x)Ib{x), (3) 
Jb 

where 1b denotes the indicator function of B, and 

s{x) = / Q (a;, da;') (l — a (a;, a;')) 

is the probability of rejection at x. It is easily seen that the detailed balance condition (2) holds if 
and only if [17, 48, 49] 

vr (da;) Q (a;, da;') a (a;, a;') = vr (da;') Q (a;', da;) a (a;', a;) . (4) 

This is achieved, for instance, by the acceptance probability 

a (a;, a;') = min {l, r (a;, a;') } , (5) 

where r{x, x') denotes the Metropolis-Hastings-Green (MHG) ratio 

, vr (da;') Q (a;', da;) 

r[x,x} = — -— • (6) 
^ ' TT [dx) Q {x,ax') 
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The right-hand side of (6) is the Radon-Nykodim derivative of tt {dx') Q {x',dx) with respect 
to vr (da;) Q {x, dx'); see [49, Section 2] for technical details. 

Remark It is proved in [49, Section 4] that the acceptance probability (5) is optimal in the sense 
of minimizing the asymptotic variance of sample path averages among all acceptance rates satis- 
fying (4). 



3 Mixture of proposal kernels 

3.1 Metropolis-Hastings-Green ratio for mixture of proposal kernels 

It is often convenient to consider a proposal kernel Q built as a mixture of simpler transition 
kernels Qm, with m in some finite or countable index set M. In this case we have 

Q [x, dx') = ^ j{x, m) Q„i {x, dx') , (7) 

where j {x, m) is the probability of choosing the move type m given that the current state is x. 
Note that the actual value of Qm{x, • ) is in^elevant when j{x, m) = 0. 

It turns out that, under some assumptions, the MHG ratio for a mixture kernel Q can be 
conveniently deduced from the elementary ratios computed for each individual kernel Qm using 
the formula 

/ n j (x', m') TT{dx')Qrn' ix',dx) 

r[x,x) = — s „ , — -r^, (8) 

j{x,m) TT{dx)Qm{x,dx') 

where m € M denotes the specific move that has been used to propose x', and m' G M is the 
corresponding "reverse move". Equation (8) is routinely used in applications of the RJ-MCMC 
algorithm, and is alluded to in Green's paper [17, p. 717] in the sentence : ''If [other] discrete 
variables are generated in making proposals, the probability functions of their realized values 
are multiplied into the move probabilities" — ^but it is wrong in general. Sufficient conditions for 
Equation (8) to hold are provided by the following result: 

Proposition 1. Let 

Rm{dx, dx') = j{x, m) ^{dx) Qm{x, dx'). 
Assume that there exists a family of disjoint sets Wm ^ B ® B indexed by M such that : 
i) For each m E M, Rm is supported by Wm, which means Rm (X^ \ Wm) = 0. 



' m' 



ii) Each move m € M has a unique "reverse move" ip{m) € M in the sense that W^(m) ' 
where W^, = {{x' , x) : {x, x') € W^}. 

Then, then MHG ratio is given by Equation (8) with m' = ip[m). 

Proof. For TT{dx) Q{x, da;')-almost everywhere on X^, there is a unique m = mx,x' G M such 
that {x,x') G Wm- Equation (8) can be rewritten as: 

_ R^(^m^^,)idx',dx) 
'^^'^^ " Rm^ Jdx,dx') • 
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Then, for all A eB^B, 



j j r{x,x')R{dx,dx') 



i?„VAnw,. fi™,(<ix.dx') ^■"('l-.'i-) 



^ ^ i?(da;, d£c') because Wj^ 
i?(da;',da;). 



mo **ip(mo) 



□ 



3.2 Mixture representation of trans-dimensional kernels 

Consider the case of a variable-dimensional space, that can be written as X = U^g/c {k} x Qj^, 
with /C a finite or countable set (usually /C C N) and 9^ C M."''^. A point £C € X is a pair {k, 9) 
with k ^ K, and G 0^. The problem of sampling a (posterior) distribution on such a space 
typically occurs in the context of Bayesian model selection or averaging. 

Set Xjt = {fc} X 0^.. Any kernel Q on X admits a natural representation as a mixture of 
fixed-dimensional and trans-dimensional kernels : 

Q[x,dx') = ^ pk,i{x)Qk,i{x,<lx') , (9) 

(/c,Z)G/C2 



where 



Pk,i{x) = t^^{x)Q{x,Xi) 



QkA^, •) = — ^-r^Q{x, ■ nXi) . 

(An arbitrary value can be chosen for Qk,i{x, • ) when Pk,i{x) = to make it a completely 
defined transition kernel.) The kernels Qk,k, A; € /C, correspond to the "fixed-dimensional" part 
of the transition kernel Q; while the kernels Qk,i, {k, I) € /C^, k ^ I, correspond to the "trans- 
dimensional" part. 

The mixture representation (9) satisfy the assumptions of Proposition 1 with M = /C^ , ^ = 
Xk X X; for all {k, /) S M and ip{k, I) = {I, k). Therefore, if the current state x is in X^ and the 
proposed state x' in X;, the MHG ratio (8) reads 

, /^ Pi,k{^') ■K{dx')Qi^k{x',dx) 

r{,x,x) = — I — (10) 

Pk,i[x) ■K{dx)Qk,i(x,dx') 

In most "tutorial" papers about the RJ-MCMC method, this expression is directly written in the 
special case where Green's dimension matching argument can be applied (see, e.g., [18], Sec- 
tions 2.2 and 2.3). Unfortunately, the dimension matching ai^gument does not apply directly to the 
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commonly used Birth-or-Death kernels (see next section) if the mixture representation (9), which 
leads to (10), is used. 

4 Birth-or-Death kernels 

4.1 Birth-or-Death kernels on (unsorted) vectors 

Let us consider the situation where a point a; G X describes a set of k objects si, . . . ,Sk € S, 
with (S, u) an atomless^ measure space and /c G N. One possible — and commonly used — way of 
representing this is to consider pairs (A;, s), where the objects Sj, 1 < i < k, have been arranged 
in a vector s = (si, . . . , s^) G S'^. The corresponding space is X = Uk>o^k, ^fc = {k} x S'^, 
with the convention that S'^ = {0}. 

Remark The results that will be presented in this section ai^e easily generalized if the model 
includes additional (fixed-dimensional) parameters that are left unchanged by the Birth-or-Death 
moves (for instance the parameters A and 6'^ in a fully Bayes version of the model presented in 
Section 5). 

Birth-or-death kernels are the most natural kind of trans-dimensional moves in such spaces. 
Given € N, s = (si, . . . , Sfc) G S'^ and s* G S, we introduce the notations 

s-i = (si,...,Sj_i,Si+i,...,Sfc) G §''"\ 
s (Bi s* = (si, . . . , Si— I, s* , Si, . . . , Sfc) G S'^^"'^, 

where I < i < kin the first case and l<i</c + lin the second case. Starting from x = {k, s), 
a birth move inserts a new component s* G S, generated according to some proposal distribution 
q{s) v{ds), at a randomly selected location: 

^ fc+1 „ 

'5b(^' •) = Tir[Yl / ^{k+l,s(BiS-) lis*) l^{ds*) . (11) 

i=l ''^ 

A death move, on the contrary, removes a randomly selected component form the current state: 

1 

Qdix, •) = ^Y.S^k-i,s.^)- (12) 

1=1 

Finally, the birth-or-death kernel is a mixture of the two: 

Q{x, • ) = Ph{x) Qh{x, • ) + Pd{x) Qd{x, • ) , (13) 
withp],{x) > 0,pd{x) > 0,pb{x) +pd{x) = 1, andpd ((0,(2)) = 0. 

4.2 Expression of the MHG ratio 

The following proposition provides the expression of the MHG ratio for the model and kernel 
described in Section 4.1. 

^See, e.g., [14]. As a concrete example, think of § = R'' endowed with its usual Borel cr-algebra and ly equal to 
Lebesgue's measure. We will use the following property in the proof of Proposition 2: if (S, i/) is atomless, then the 
diagonal A = {(s, s) : s G S} is ig) (/-negligible in § x §. 
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Proposition 2. Assume that, for all k > 1, the target measure tt restricted to admits a proba- 
bility density function fj^ with respect to v®^. Then the MHG ratio is 

fk+i{x') Pd{x') 1 



r{x, x') 



(14) 



fk{x) pb(ic) q{s*) 
for a birth move from x = (k, s) to x' = (k + 1, s (Bi s*). 

Proof. Although a direct computation of the MHG ratio would be possible based on Equa- 
tions (11)-(13), we find it much more illuminating to deduce the result from Proposition 1 using 
kernels which are simpler than Qb and Qj. To do so, let us consider the family of elementary 
kernels Qm, with m in the index set 

M= {{a,k,i) G {0, 1} X : 1 < i < fc + a} 

where Qi^k,i is the kernel from X^ to X^+i that inserts a new component s* ~ q{s)u{ds) in 
position i, and (5o,/c,i is the kernel from X^. to X^^i that removes the i"^ component. Then we can 
write 

Q{x, •) = ^ j{x,m)Qm{x, ■), (15) 

with j{x, m) defined for all x = {k, s) £ X as 

' P],{x)/{k + 1) ifm = {l,k,i),l <i < k + l, 
j[x,m) = \p^{x)/k if m = {0,k,i),l < i < k, 

otherwise. 

Denote by X^ the set of all x G X^ in which no two components are equal. For all k, 7r(Xfc \ 



Xfc) = 0, since TTjXj. admits a density with respect to the product measure i^^ 
representation (15) thus satisfies the assumptions of Proposition 1 with 



'(i,fc,i) 



The mixture 



W 



{x, x') G Xfc X Xfc+i : 3s G 3s* G §, 
X = {k, s), x' = {k + l,s ®i s*) |, 
, 93(1, = (0,A; + l,i) and (p{0,k,i) = {l,k — 



{0,k,i) = ^Jl,k~l,i) 

Proposition 1, the MHG ratio for a birth move m = {1, k,i) is thus 



According to 



rix, X 



Pd{x') Tr{dx')Qo^k+i,i{^',dx) 



Pb{x) TT{dx)Qi^k,i{^,dx') 

Observe that the l/{k + 1) terms, in the move selection probabilities, cancel each other. To 
complete the proof, it remains to show that 

ir{dx')Qo^k+i,iW,(^x) ^ fk+i{x') _l ^^^^ 

7r(da;)Qi_fc^j(a^,da;') /^(x) q{s*) 

This can be obtained, in the general case^ , by a direct computation of the densities with respect to 
the symmetric measure 



C{d{k,s),dx') = u^^{ds) 



^{k~i,s^,){<ix') 



+ 



3(fc+l,s©,s*) 



i/(ds* 



□ 



'in the important special case where § C R'* and i/ is (the restriction of) the d-dimensional Lebesgue measure, (16) 
can be simply seen as the result of Green's dimension matching argument [17, Section 3.3], in a very simple case where 
the Jacobian is equal to one. 
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We emphasize that (15) is not the usual mixture representation of trans-dimensional kernels 
introduced in Section 3.2. Indeed, starting, e.g., from X^, there are several elementary kernels that 
can propose a point in X^+i. This shows the usefulness of Proposition 1, which provides sufficient 
conditions for (8) to hold beyond the case of the usual mixture representation (9). 

4.3 Birth-or-Death kernels on sorted vectors 

Let us assume now that the objects are "sorted", in some sense, before being arranged in the vector 
s = (si, . . . , Sfc) G S'^. This happens, in practice, either when there is a natural ordering on the set 
of objects (e.g., the jump times in signal segmentation or multiple change-point problems [17, 34]) 
or when artificial constraints are introduced to restore identifi ability in the case of exchangeable 
components (see [9, 23, 35, 36, 46] for the case of mixture models). 

To formalize this, let us consider the same space X as in Section 4. 1 . Assume that S is endowed 
with a total order and that the corresponding "sort function" : X — )• X is measurable. What we 
are assuming now is that the target measure, denoted by tt in this section, is supported by '^(X) — in 
other words, the components of a; € X are vr-almost surely sorted. 

In such a setting, the definition of the Birth-or-Death kernel has to be slightly modified in 
order to accommodate the sort constraint: the death kernel is unchanged, but new components are 
inserted deterministically at the only location that makes the resulting vector sorted (instead of 
being added at a random location). Mathematically, for x = {k, s) £ X^, we now have: 



Proceeding as in the proof of Proposition 2, it can be proved that the MHG ratio for a birth move 
from X = {k, s) to x' = {k + 1, s Qi s*) is 



where fk denotes the pdf of vr on X^ and r]i{x) the probability that s* ~ q{s) i^(ds) is inserted at 
location i in x. (Note that Ph{x) i]i{x) is the probability of performing a birth move at location i, 
and pd{x')/{k + 1) the probability of the reverse death move; this is the appropriate way of 
decomposing this kernel as mixture in order to use Proposition I .) 

Let us now consider the case where, in the setting of Section 4.1, the target probability 
measure vr is invariant under permutations of the components indices (in other words, the cor- 
responding random variables ai^e exchangeable [8, Chapter 4]). Sorting the components (as an 
identifiability device) is equivalent to looking at the image measure tt = vr^, which has the pdf 
fk = k\ fk l,/,(x) on ^fc- As a consequence, the MHG ratios (14) and (17) are equal. 

Remark Another option, when the components of the vector (si, . . . , s^) are exchangeable, is to 
forget about the indices and consider the set {si, . . . ,Sk} instead. The object of interest is then 
a (random) finite set of points in S — in other words, a point process on S. The expression of 
the MHG ratio for Birth-or-Death moves in the point process framework, with the Poisson point 
process as a reference measure, has been given in [15] (one year before the publication of Green's 
paper [17]). Point processes have been widely used, since then, in image processing and object 
identification (see, e.g., [12, 24, 41, 47]). 




i=l 



r{x, x') 



fk+i{x') _ Pd{x')/{k + 1) 
fkix) Ph{x)r]i{x) 



q{s*)/r]i{x) 



1 



(17) 
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5 Example: joint detection and estimation of sinusoids in white 
Gaussian noise 



The results presented in Section 4 can be used to compute the MHG ratio easily in many signal 
decomposition problems. Let us illustrate this with the joint Bayesian model selection and param- 
eter estimation of sinusoids in white Gaussian noise, as first considered by [1]. As explained in the 
introduction, this seminal paper introduced the RJ-MCMC methodology in the signal processing 
community, and at the same time introduced an erroneous expression of the MHG ratio that has 
been, since then, reproduced in a long series of papers. We follow closely the model and notations 
of [1]; the reader is refen^ed to the original paper for more details. 

Let y = (yi, y2, • • • , ?/7v)* be a vector of N observations of an observed signal. We consider 
the finite family of nested models Mq C Mi C ■ ■ ■ C Mk^.^^, where Mk assumes that y 
is composed of k sinusoids observed in white Gaussian noise. Let Uk = (i^i,fc5 • • • :<^k,k) and 
a/c = (oci fe 1 fe ) • • • ) Ocj. ^, , flsfc fe) be the vectors of radial frequencies and cosine/sine amplitudes 
under model M-k, respectively; moreover, let Dfc be the corresponding N x 2k design matrix. 
Then, the observed signal y follows under Mk a. normal linear- regression model: 

y = Dfc.afc + n, 

where n is a white Gaussian noise with variance a^. The unknown parameters are, then, assumed 
to be the number of components k, the component-specific parameters Ok = {afc,a^fc} and the 
noise variance which is common to all models. The joint prior distribution is chosen to have 
the following hierarchical structure: 

p{k,9k,(T'^) = p{a.k I k,LJk,a'^) p{uk \ k) p{k) p{cf'^)., 

where the prior over ^k is the conventional g-pnor distribution [50], which is a zero mean Gaussian 
with (7^5^ (D^Dfc)"^ as its covariance matrix. Conditional on k, the radial frequencies are inde- 
pendent and identically distributed, with a uniform distribution on (0, vr). The noise variance cr^ is 
endowed with Jeffreys improper prior, i.e. p(cr^) oc l/cr^. The number of components k is given 
a Poisson distribution with mean A, truncated to {0, 1, ... , A;max}- The parameters and ci^ can 
be integrated out analytically, and the resulting marginal posterior becomes 



p{k,uk\y)^ (y^Pfcy)"^/^ ^, l(o,.)^(^fc) , (18) 



with j2 

= In - ^k (DlDfc) D* 

when A; > 1 and Pq = Iat. 

Inference under this hierarchical Bayesian model is carried out in [1] using an RJ-MCMC 
sampler on X = |Jfc=ci {^} ^ (0' ^)*^ 'VJith target density (18). We only focus here on the "between- 
models" moves, which are Birth-or-Death moves of the kind described in Section 4.1, with a 
uniform density on (0, vr) for the proposal distribution of the new frequency in the birth moves. 

Let us now compute the MHG ratio for a birth move. Note that the posterior density (18) is 
written in the case of "unsorted" components described in Sections 4. 1-4.2. We shall therefore 
make use of Proposition 2, which assumes that new component is inserted at a random position i 
(all components being selected with the same probability). The correct MHG ratio, for a birth 
move from x = {k, Uk) to x' = {k + 1, ujk ©i <^*), turns out to be 

p{k,Li)k\y) Ph{x) q{u}*) 
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Figure 1 : The pdf 's of Poisson (gray) and accelerated Poisson (black) distributions with mean A 
5. Both distributions are truncated to the set {0, . . . , 32}. 



where q denotes the uniform distribution of (0, vr). Using 

Pdjx') ^ Po(fc) ^ k + 1 
Pb{x) ~ poik + 1) ~ A 

as in [1], with pQ standing for the (truncated Poisson) prior distribution of k, we finally find 



/y*Pfe+iy\ 


-N/2 


Avr-i 


V y*P^:y J 




{l + k)il + 6^) 


k + 1 
" A " 


1 




/y*Pfc+iy\ 


-~N/2 


1 


V y*Pfcy J 




1 + 52 



(20) 



Note that the expression of the ratio proposed in [1, Equation (20)] differs from the one we find 
here by a factor l/(fc + 1). A similar mistake in computing RJ-MCMC ratios has been reported in 
the field of genetics [22, 44]. 

In fact, using the expression of the birth ratio with an additional factor of l/{k + 1), as in [1], 
amounts to assigning a different prior distribution over k called "accelerated Poisson distribu- 
tion" [44] which reads 

P2{k) cx ^^^t^{k). (21) 

Figure 1 illustrates the difference between both the accelerated (black) and the usual (gray) Poisson 
distributions when mean A = 5. It can be observed that the accelerated Poisson distribution (21) 
puts a stronger emphasis on "sparse" models, i.e., models with a small number of components. 

Let us consider an experiment in which the observed signal of length = 64 consists 
of A; = 3 sinusoidal components with the radial frequencies a;^ = (0.63,0.68,0.73)* and 
amplitudes a^.^ + "^s^ fe = (20,6.32,20)*, 1 < i < k. The signal to noise ratio, defined 
as SNR = ||D,fc.afc|p/ (Na^), is set to a moderate value of 7dB. Samples from the poste- 
rior distribution of k are obtained using the RJ-MCMC sampler of [1], with an inverse Gamma 
prior ig{2, 100) on and a Gamma prior Q{1, 10 ^) on A. For each observed signal in 100 repli- 
cations of the experiment, the sampler was run twice: once with the correct expression of the ratio, 
given by (20), and once with the erroneous expression from [1]. Figure 2 shows the frequency of 
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Figure 2: Frequency of selection for each model A4k for 100 replications of the experiment de- 
scribed in Section 5, using the expression of the ratio given in [1, Equation (20)] (black) and the 
coiTccted ratio (20) (gray). There aie k = 3 sinusoidal components in the observed signal y and 
the SNR = 7dB. 100k samples were generated using RJ-MCMC sampler and the first 20k were 
discarded as burn-in period. 



selection of each model under both the Poisson and the accelerated Poisson distribution as a prior 
for k. It appears that the (unintended) use of the accelerated Poisson distribution, induced by the 
erroneous expression of the MHG ratio, can result in a significant shift to the left of the posterior 
distribution of k. 

Remark Working with "sorted" vectors of frequencies would be quite natural in this problem, 
since the frequencies are exchangeable under the posterior (18). As explained in Section 4.3, the 
expression of the MHG ratio would be the same. 

Remark The reason why the MHG ratio in [1] is wrong can be understood from a subsequent 
paper [4], where the same computation is explained in greater detail. There we can see that the 
authors, working with an "unsorted vector" representation, consider that the new component in a 
birth move is inserted at the end. The death move, however, is defined as in the present paper: a 
sinusoid to be removed is selected randomly among the existing components. Here is the mistake: 
if the new component is inserted at the end during a birth move, then any attempt at removing 
a component which is not the last one should be rejected during a death move. In other words, 
the acceptance probability should be zero when any component but the last one is picked to be 
removed during a death move. 



6 Conclusion 

The computation of MHG ratios is a delicate matter involving measure-theoretic considerations, 
for which practitioners need clear mathematical statements that can be used "out of the box". 
Such a statement has been available for a long time in the classical fixed-dimensional Metropolis- 
Hastings sampler, and more recently provided by Green [17] for trans-dimensional moves that 
comply with the assumptions of his dimension matching argument. 

In this note, we have provided the expression of the MHG ratio for Birth-or-Death moves, 
using a general result for mixtures of proposal kernels, and coiTccted the erroneous expression 
provided by [1]. A similar coiTcction has to be applied to the ratios used in the long series of 
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signal processing papers [3-6, 10, 20, 25-21 , 30, 31, 40, 42, 43] that have been found to contain 
the same mistake. 

While writing this note, we discovered that a very similar mistake had been detected and 
corrected in the field of genetics by [22], from which we boiTow our concluding words: The 
fact that this error has remained in the literature for over 5 years [12 years in the present case] 
underscores the view that while Bayesian analysis using Markov chain Monte Carlo is incredibly 
flexible and therefore powerful, the devil is in the details. Furthermore, incorrect analyses can 
give results that seem quite reasonable. 
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